-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update MG negative sampling to return random samples distributed as specified #4885
Update MG negative sampling to return random samples distributed as specified #4885
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we update the documentation?
Just pushed an update to the documentation. |
Can we hold this PR until I have a corresponding one in cuGraph-GNN? |
|
||
if constexpr (multi_gpu) { | ||
samples_per_gpu = host_scalar_allgather(handle.get_comms(), num_samples, handle.get_stream()); | ||
handle.sync_stream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sync_stream
call is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push.
while (reduction > 0) { | ||
size_t est_reduction_per_gpu = (reduction + comm_size - 1) / comm_size; | ||
for (size_t i = 0; i < samples_per_gpu.size(); ++i) { | ||
if (samples_per_gpu[i] > est_reduction_per_gpu) { | ||
samples_per_gpu[i] -= est_reduction_per_gpu; | ||
reduction -= est_reduction_per_gpu; | ||
} else { | ||
reduction -= samples_per_gpu[i]; | ||
samples_per_gpu[i] = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic has a flaw.
Say reduction = 3 & comm_size = 2.
Then, est_reduction_per_gpu = 2
If samples_per_gpu[] > 2, reduction = 3 - 2 - 2 = overflow!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is (I hope) an uncommon case, usually I believe it is called with exact_number_of_samples == true. The code's probably not optimal... I was mostly looking for functional.
The next line should correct the edge condition you're concerned about. When i
is 0, reduction = 3, comm_size = 2, so set_reduction_per_gpu = 2. We'll take the first branch and reduce reduction to 1. The next line will then trigger and reduce set_reduction_per_gpu to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce set_reduction_per_gpu to 1.
You mean est_reduction_per_gpu? This is set before the for loop. Once it is set to 2 in line 460, it will be 2 till the end of the for loop in line 416. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I missed the line 470.
rmm::device_uvector<int> reduced_ranks(comm_size, handle.get_stream()); | ||
rmm::device_uvector<size_t> reduced_counts(comm_size, handle.get_stream()); | ||
|
||
reduced_ranks.resize( | ||
thrust::distance(reduced_ranks.begin(), | ||
thrust::reduce_by_key(handle.get_thrust_policy(), | ||
gpu_assignment.begin(), | ||
gpu_assignment.end(), | ||
thrust::make_constant_iterator(size_t{1}), | ||
reduced_ranks.begin(), | ||
reduced_counts.begin(), | ||
thrust::equal_to<int>()) | ||
.first), | ||
handle.get_stream()); | ||
reduced_counts.resize(reduced_ranks.size(), handle.get_stream()); | ||
|
||
rmm::device_uvector<size_t> send_count(comm_size, handle.get_stream()); | ||
thrust::fill(handle.get_thrust_policy(), send_count.begin(), send_count.end(), 0); | ||
thrust::scatter(handle.get_thrust_policy(), | ||
reduced_counts.begin(), | ||
reduced_counts.end(), | ||
reduced_ranks.begin(), | ||
send_count.begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpu_assignment is already sorted.
We can use thrust::upper_bound to find boundaries. This will require just comm_size binary searches and will be much faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in the next push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
@ChuckHastings doesn't this affect negative sampling in the C API? So once this merges, I'll have to update |
This changes the semantic meaning of the parameter. Prior to this change, if you wanted this to run against an MG graph you would specify the total number of samples across all GPUs as the parameter for So, depending on what you implemented in |
Adds a heterogeneous link prediction example for cuGraph-PyG that uses the Taobao dataset. Loosely based on the Taobao example from the PyG repository. Adds ability to specify fanout as a dictionary to better align with PyG API. Fixes a bug where the number of negative samples was calculated incorrectly, causing additional unwanted negative samples to be generated. Updates the negative sampling call to match the new behavior added in rapidsai/cugraph#4885 Merge after rapidsai/cugraph#4898 Authors: - Alex Barghi (https://github.com/alexbarghi-nv) Approvers: - Tingyu Wang (https://github.com/tingyu66) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #104
Modifies the new negative sampling interface so that when called from MG, each rank specifies how many samples they wish to receive, and to randomly distribute the samples across the calling GPUs.
Marked breaking as it changes the C++ interface... although nothing uses it yet.
Closes #4672